大数据开发之Hive篇13 您所在的位置:网站首页 hive 建表索引 大数据开发之Hive篇13

大数据开发之Hive篇13

2024-03-25 19:06| 来源: 网络整理| 查看: 265

备注: Hive 版本 2.1.1

文章目录 一.Hive索引简介二.Hive 索引案例2.1 索引语法2.2 索引相关操作2.3 创建索引案例2.4 使用索引案例

一.Hive索引简介

Hive索引的目标是提高对表的某些列进行查询查找的速度。如果没有索引,则使用类似于“WHERE tab1.col1 = 10”这样的谓词进行查询’加载整个表或分区并处理所有行。但是如果col1存在索引,那么只需要加载和处理文件的一部分。

索引所能提供的查询速度的提高是以创建索引和存储索引的磁盘空间的额外处理为代价的。

Hive 3.0开始将 移除index的功能,取而代之的是Hive 2.3版本开始的物化视图,自动重写的物化视图替代了index的功能。

索引是传统RDBMS的标准技术,用来加速查询 Hive自0.7版本开始支持索引,但提供的功能很有限,效率也并不高,因此Hive索引很少使用

Hive索引原理: 1) 在指定列上建立索引,生成一张索引表(Hive的一张物理表),记录以下三个字段:索引列的值、该值对应的HDFS文件路径、该值在文件中的偏移量 2 ) 在执行索引字段查询时候,首先额外生成一个MapReduce job,根据对索引列的过滤条件,从索引表中过滤出索引列的值对应的hdfs文件路径及偏移量,输出到hdfs上的一个文件中,然后根据这些文件中的 hdfs路径和偏移量,筛选原始input文件,生成新的split,作为整个job的split,达到不用全表扫描的目的

Hive索引的弊端: 每次查询时候都要先用一个job扫描索引表,如果索引列的值非常稀疏,那么索引表本身也会非常大 索引表不会自动rebuild,如果表有数据新增或删除,那么必须手动rebuild索引表数据

二.Hive 索引案例 2.1 索引语法

语法:

-- 创建索引 CREATE INDEX index_name ON TABLE base_table_name (col_name, ...) AS 'index.handler.class.name' [WITH DEFERRED REBUILD] [IDXPROPERTIES (property_name=property_value, ...)] [IN TABLE index_table_name] [PARTITIONED BY (col_name, ...)] [ [ ROW FORMAT ...] STORED AS ... | STORED BY ... ] [LOCATION hdfs_path] [TBLPROPERTIES (...)] [COMMENT "index comment"] -- 删除索引 DROP INDEX [IF EXISTS] index_name ON table_name; -- 更改索引属性 ALTER INDEX index_name ON table_name [PARTITION partition_spec] REBUILD; 2.2 索引相关操作 创建/构建,查看和删除索引 CREATE INDEX table01_index ON TABLE table01 (column2) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'; SHOW INDEX ON table01; DROP INDEX table01_index ON table01; 创建和构建,指定列格式化查看和删除索引 CREATE INDEX table02_index ON TABLE table02 (column3) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD; ALTER INDEX table02_index ON table2 REBUILD; SHOW FORMATTED INDEX ON table02; DROP INDEX table02_index ON table02; 创建位图,查看和删除索引 CREATE INDEX table03_index ON TABLE table03 (column4) AS 'BITMAP' WITH DEFERRED REBUILD; ALTER INDEX table03_index ON table03 REBUILD; SHOW FORMATTED INDEX ON table03; DROP INDEX table03_index ON table03; 在新表中创建索引 CREATE INDEX table03_index ON TABLE table03 (column4) AS 'BITMAP' WITH DEFERRED REBUILD; ALTER INDEX table03_index ON table03 REBUILD; SHOW FORMATTED INDEX ON table03; DROP INDEX table03_index ON table03; 创建rc文件格式的索引 CREATE INDEX table05_index ON TABLE table05 (column6) AS 'COMPACT' STORED AS RCFILE; 创建textfile文件格式的索引 CREATE INDEX table06_index ON TABLE table06 (column7) AS 'COMPACT' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE; 创建索引带索引属性 CREATE INDEX table07_index ON TABLE table07 (column8) AS 'COMPACT' IDXPROPERTIES ("prop1"="value1", "prop2"="value2"); 创建索引带表属性 CREATE INDEX table08_index ON TABLE table08 (column9) AS 'COMPACT' TBLPROPERTIES ("prop3"="value3", "prop4"="value4"); 如存在删除索引 DROP INDEX IF EXISTS table09_index ON table09; 在一个分区重建索引 ALTER INDEX table10_index ON table10 PARTITION (columnX='valueQ', columnY='valueR') REBUILD; 2.3 创建索引案例

代码:

create index ods_fact_sale_idx02 on table ods_fact_sale (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild; alter index ods_fact_sale_idx02 on ods_fact_sale rebuild; SHOW FORMATTED INDEX ON ODS_FACT_SALE; DROP INDEX ods_fact_sale_idx02 on ods_fact_sale;

测试记录:

hive> > DROP INDEX ods_fact_sale_idx02 on ods_fact_sale; OK Time taken: 0.126 seconds hive> > create index ods_fact_sale_idx02 on table ods_fact_sale (id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild; OK Time taken: 0.255 seconds hive> alter index ods_fact_sale_idx02 on ods_fact_sale rebuild; Query ID = root_20201222153341_45c09d53-2fe9-4316-8ef4-6b36f79242d5 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 469 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Starting Job = job_1606698967173_0354, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0354/ Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0354 Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 469 2020-12-22 15:33:50,110 Stage-1 map = 0%, reduce = 0% 2020-12-22 15:34:13,830 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 61.66 sec 2020-12-22 15:34:32,360 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 118.75 sec 2020-12-22 15:35:00,105 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 194.41 sec 2020-12-22 15:35:30,915 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 288.08 sec 2020-12-22 15:35:46,308 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 338.34 sec 2020-12-22 15:36:10,946 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 409.04 sec 2020-12-22 15:36:23,270 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 443.45 sec 2020-12-22 15:36:50,995 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 529.32 sec 2020-12-22 15:37:04,332 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 571.73 sec 2020-12-22 15:37:31,002 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 648.36 sec 2020-12-22 15:37:58,701 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 735.16 sec 2020-12-22 15:38:09,966 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 769.21 sec 2020-12-22 15:38:37,652 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 850.71 sec 2020-12-22 15:38:49,957 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 892.39 sec 2020-12-22 15:39:16,572 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 968.21 sec 2020-12-22 15:39:45,276 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 1059.42 sec 2020-12-22 15:39:55,520 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 1084.93 sec 2020-12-22 15:40:25,223 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 1173.97 sec 2020-12-22 15:40:38,550 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 1217.52 sec 2020-12-22 15:41:12,398 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 1311.11 sec 2020-12-22 15:41:22,641 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 1344.35 sec 2020-12-22 15:41:49,273 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 1421.17 sec 2020-12-22 15:42:15,892 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 1506.27 sec 2020-12-22 15:42:28,190 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 1540.37 sec 2020-12-22 15:42:54,829 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 1625.41 sec 2020-12-22 15:43:07,140 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 1667.05 sec 2020-12-22 15:43:33,773 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 1743.24 sec 2020-12-22 15:44:04,508 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 1836.61 sec 2020-12-22 15:44:13,725 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 1861.45 sec 2020-12-22 15:44:43,418 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 1950.29 sec 2020-12-22 15:44:57,770 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 1989.13 sec 2020-12-22 15:45:24,410 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 2072.34 sec 2020-12-22 15:45:54,127 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 2165.23 sec 2020-12-22 15:46:05,399 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 2191.15 sec 2020-12-22 15:46:35,076 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 2281.29 sec 2020-12-22 15:46:41,231 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 2306.47 sec 2020-12-22 15:47:17,071 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 2409.37 sec 2020-12-22 15:47:22,199 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 2436.0 sec 2020-12-22 15:47:51,918 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 2519.65 sec 2020-12-22 15:48:18,555 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 2602.14 sec 2020-12-22 15:48:33,925 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 2644.48 sec 2020-12-22 15:49:01,573 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 2728.82 sec 2020-12-22 15:49:14,890 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 2769.67 sec 2020-12-22 15:49:39,457 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 2844.61 sec 2020-12-22 15:50:08,161 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 2918.23 sec 2020-12-22 15:50:20,453 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 2966.91 sec 2020-12-22 15:50:52,200 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 3065.02 sec 2020-12-22 15:51:01,420 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 3091.15 sec 2020-12-22 15:51:31,093 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 3180.02 sec 2020-12-22 15:51:42,352 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 3219.38 sec 2020-12-22 15:52:02,842 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 3282.47 sec 2020-12-22 15:52:25,360 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 3346.31 sec 2020-12-22 15:52:49,915 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 3412.03 sec 2020-12-22 15:53:09,353 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 3466.34 sec 2020-12-22 15:53:39,022 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 3559.84 sec 2020-12-22 15:53:55,424 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 3600.18 sec 2020-12-22 15:54:16,894 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 3677.0 sec 2020-12-22 15:54:40,450 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 3742.23 sec 2020-12-22 15:55:06,022 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 3809.95 sec 2020-12-22 15:55:27,531 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 3872.88 sec 2020-12-22 15:55:46,970 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 3949.54 sec 2020-12-22 15:56:11,541 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 4019.95 sec 2020-12-22 15:56:35,074 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 4079.36 sec 2020-12-22 15:56:50,434 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 4134.21 sec 2020-12-22 15:57:17,045 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 4218.52 sec 2020-12-22 15:57:29,327 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 4250.03 sec 2020-12-22 15:57:59,029 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 4343.61 sec 2020-12-22 15:58:14,383 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 4372.78 sec 2020-12-22 15:58:39,990 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 4457.12 sec 2020-12-22 15:59:03,531 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 4516.39 sec 2020-12-22 15:59:20,938 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 4579.95 sec 2020-12-22 15:59:47,535 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 4662.51 sec 2020-12-22 16:00:00,865 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 4704.85 sec 2020-12-22 16:00:28,513 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 4781.9 sec 2020-12-22 16:00:58,207 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 4859.82 sec 2020-12-22 16:01:12,536 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 4915.46 sec 2020-12-22 16:01:40,186 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 5000.19 sec 2020-12-22 16:01:49,392 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 5032.13 sec 2020-12-22 16:02:17,051 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 5116.37 sec 2020-12-22 16:02:46,743 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 5185.63 sec 2020-12-22 16:03:01,095 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 5241.31 sec 2020-12-22 16:03:34,886 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 5312.93 sec 2020-12-22 16:04:10,729 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 5368.73 sec 2020-12-22 16:04:45,541 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 5422.23 sec 2020-12-22 16:05:26,495 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 5485.99 sec 2020-12-22 16:06:25,883 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 5576.06 sec 2020-12-22 16:07:00,664 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 5630.26 sec 2020-12-22 16:07:43,677 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 5692.55 sec 2020-12-22 16:08:24,613 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 5752.85 sec 2020-12-22 16:09:05,589 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 5814.03 sec 2020-12-22 16:09:40,413 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 5869.31 sec 2020-12-22 16:10:40,832 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 5953.63 sec 2020-12-22 16:10:44,928 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 5961.29 sec 2020-12-22 16:11:25,875 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 6024.9 sec 2020-12-22 16:12:07,848 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 6088.08 sec 2020-12-22 16:12:42,648 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 6142.66 sec 2020-12-22 16:13:23,613 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 6206.25 sec 2020-12-22 16:14:01,477 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 6263.55 sec 2020-12-22 16:14:57,801 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 6347.73 sec 2020-12-22 16:15:33,637 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 6401.32 sec 2020-12-22 16:16:28,934 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6484.47 sec 2020-12-22 16:16:44,291 Stage-1 map = 100%, reduce = 1%, Cumulative CPU 6520.9 sec 2020-12-22 16:17:05,796 Stage-1 map = 100%, reduce = 2%, Cumulative CPU 6582.22 sec 2020-12-22 16:17:24,237 Stage-1 map = 100%, reduce = 3%, Cumulative CPU 6631.31 sec 2020-12-22 16:17:46,772 Stage-1 map = 100%, reduce = 4%, Cumulative CPU 6692.69 sec 2020-12-22 16:18:09,259 Stage-1 map = 100%, reduce = 5%, Cumulative CPU 6756.62 sec 2020-12-22 16:18:28,730 Stage-1 map = 100%, reduce = 6%, Cumulative CPU 6806.64 sec 2020-12-22 16:18:53,284 Stage-1 map = 100%, reduce = 7%, Cumulative CPU 6869.09 sec 2020-12-22 16:19:13,775 Stage-1 map = 100%, reduce = 8%, Cumulative CPU 6932.39 sec 2020-12-22 16:19:33,219 Stage-1 map = 100%, reduce = 9%, Cumulative CPU 6980.84 sec 2020-12-22 16:19:55,741 Stage-1 map = 100%, reduce = 10%, Cumulative CPU 7042.75 sec 2020-12-22 16:20:20,321 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 7106.03 sec 2020-12-22 16:20:37,736 Stage-1 map = 100%, reduce = 12%, Cumulative CPU 7155.75 sec 2020-12-22 16:20:59,240 Stage-1 map = 100%, reduce = 13%, Cumulative CPU 7218.24 sec 2020-12-22 16:21:23,794 Stage-1 map = 100%, reduce = 14%, Cumulative CPU 7280.81 sec 2020-12-22 16:21:44,279 Stage-1 map = 100%, reduce = 15%, Cumulative CPU 7341.54 sec 2020-12-22 16:22:02,715 Stage-1 map = 100%, reduce = 16%, Cumulative CPU 7391.08 sec 2020-12-22 16:22:27,287 Stage-1 map = 100%, reduce = 17%, Cumulative CPU 7453.5 sec 2020-12-22 16:22:48,813 Stage-1 map = 100%, reduce = 18%, Cumulative CPU 7515.72 sec 2020-12-22 16:23:06,232 Stage-1 map = 100%, reduce = 19%, Cumulative CPU 7565.05 sec 2020-12-22 16:23:30,816 Stage-1 map = 100%, reduce = 20%, Cumulative CPU 7628.2 sec 2020-12-22 16:23:52,322 Stage-1 map = 100%, reduce = 21%, Cumulative CPU 7691.45 sec 2020-12-22 16:24:10,772 Stage-1 map = 100%, reduce = 22%, Cumulative CPU 7742.07 sec 2020-12-22 16:24:33,332 Stage-1 map = 100%, reduce = 23%, Cumulative CPU 7803.63 sec 2020-12-22 16:24:55,863 Stage-1 map = 100%, reduce = 24%, Cumulative CPU 7865.39 sec 2020-12-22 16:25:14,322 Stage-1 map = 100%, reduce = 25%, Cumulative CPU 7915.86 sec 2020-12-22 16:25:36,863 Stage-1 map = 100%, reduce = 26%, Cumulative CPU 7978.59 sec 2020-12-22 16:25:58,383 Stage-1 map = 100%, reduce = 27%, Cumulative CPU 8039.64 sec 2020-12-22 16:26:16,826 Stage-1 map = 100%, reduce = 28%, Cumulative CPU 8090.56 sec 2020-12-22 16:26:40,365 Stage-1 map = 100%, reduce = 29%, Cumulative CPU 8153.82 sec 2020-12-22 16:27:01,879 Stage-1 map = 100%, reduce = 30%, Cumulative CPU 8215.84 sec 2020-12-22 16:27:25,454 Stage-1 map = 100%, reduce = 31%, Cumulative CPU 8276.43 sec 2020-12-22 16:27:42,863 Stage-1 map = 100%, reduce = 32%, Cumulative CPU 8325.88 sec 2020-12-22 16:28:05,402 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 8391.38 sec 2020-12-22 16:28:28,991 Stage-1 map = 100%, reduce = 34%, Cumulative CPU 8454.08 sec 2020-12-22 16:28:47,436 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 8504.74 sec 2020-12-22 16:29:09,986 Stage-1 map = 100%, reduce = 36%, Cumulative CPU 8567.02 sec 2020-12-22 16:29:33,571 Stage-1 map = 100%, reduce = 37%, Cumulative CPU 8627.9 sec 2020-12-22 16:29:51,003 Stage-1 map = 100%, reduce = 38%, Cumulative CPU 8677.2 sec 2020-12-22 16:30:13,552 Stage-1 map = 100%, reduce = 39%, Cumulative CPU 8739.33 sec 2020-12-22 16:30:36,087 Stage-1 map = 100%, reduce = 40%, Cumulative CPU 8802.55 sec 2020-12-22 16:30:54,539 Stage-1 map = 100%, reduce = 41%, Cumulative CPU 8851.75 sec 2020-12-22 16:31:17,098 Stage-1 map = 100%, reduce = 42%, Cumulative CPU 8914.12 sec 2020-12-22 16:31:39,653 Stage-1 map = 100%, reduce = 43%, Cumulative CPU 8976.57 sec 2020-12-22 16:32:02,203 Stage-1 map = 100%, reduce = 44%, Cumulative CPU 9038.99 sec 2020-12-22 16:32:20,654 Stage-1 map = 100%, reduce = 45%, Cumulative CPU 9089.78 sec 2020-12-22 16:32:43,206 Stage-1 map = 100%, reduce = 46%, Cumulative CPU 9154.28 sec 2020-12-22 16:33:06,798 Stage-1 map = 100%, reduce = 47%, Cumulative CPU 9217.02 sec 2020-12-22 16:33:25,263 Stage-1 map = 100%, reduce = 48%, Cumulative CPU 9264.79 sec 2020-12-22 16:33:46,816 Stage-1 map = 100%, reduce = 49%, Cumulative CPU 9326.92 sec 2020-12-22 16:34:11,410 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 9389.93 sec 2020-12-22 16:34:29,847 Stage-1 map = 100%, reduce = 51%, Cumulative CPU 9440.26 sec 2020-12-22 16:34:52,404 Stage-1 map = 100%, reduce = 52%, Cumulative CPU 9502.12 sec 2020-12-22 16:35:14,992 Stage-1 map = 100%, reduce = 53%, Cumulative CPU 9563.36 sec 2020-12-22 16:35:34,477 Stage-1 map = 100%, reduce = 54%, Cumulative CPU 9613.78 sec 2020-12-22 16:35:58,060 Stage-1 map = 100%, reduce = 55%, Cumulative CPU 9675.66 sec 2020-12-22 16:36:20,623 Stage-1 map = 100%, reduce = 56%, Cumulative CPU 9738.18 sec 2020-12-22 16:36:39,069 Stage-1 map = 100%, reduce = 57%, Cumulative CPU 9787.32 sec 2020-12-22 16:37:01,654 Stage-1 map = 100%, reduce = 58%, Cumulative CPU 9849.5 sec 2020-12-22 16:37:24,207 Stage-1 map = 100%, reduce = 59%, Cumulative CPU 9911.15 sec 2020-12-22 16:37:45,743 Stage-1 map = 100%, reduce = 60%, Cumulative CPU 9973.2 sec 2020-12-22 16:38:04,194 Stage-1 map = 100%, reduce = 61%, Cumulative CPU 10022.28 sec 2020-12-22 16:38:27,756 Stage-1 map = 100%, reduce = 62%, Cumulative CPU 10084.13 sec 2020-12-22 16:38:49,296 Stage-1 map = 100%, reduce = 63%, Cumulative CPU 10143.47 sec 2020-12-22 16:39:07,755 Stage-1 map = 100%, reduce = 64%, Cumulative CPU 10194.02 sec 2020-12-22 16:39:31,358 Stage-1 map = 100%, reduce = 65%, Cumulative CPU 10254.1 sec 2020-12-22 16:39:52,885 Stage-1 map = 100%, reduce = 66%, Cumulative CPU 10315.81 sec 2020-12-22 16:40:10,353 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 10366.31 sec 2020-12-22 16:40:34,963 Stage-1 map = 100%, reduce = 68%, Cumulative CPU 10427.09 sec 2020-12-22 16:40:57,525 Stage-1 map = 100%, reduce = 69%, Cumulative CPU 10488.52 sec 2020-12-22 16:41:15,979 Stage-1 map = 100%, reduce = 70%, Cumulative CPU 10538.81 sec 2020-12-22 16:41:38,569 Stage-1 map = 100%, reduce = 71%, Cumulative CPU 10601.0 sec 2020-12-22 16:42:01,113 Stage-1 map = 100%, reduce = 72%, Cumulative CPU 10661.02 sec 2020-12-22 16:42:24,693 Stage-1 map = 100%, reduce = 73%, Cumulative CPU 10724.19 sec 2020-12-22 16:42:44,185 Stage-1 map = 100%, reduce = 74%, Cumulative CPU 10774.42 sec 2020-12-22 16:43:03,667 Stage-1 map = 100%, reduce = 75%, Cumulative CPU 10835.19 sec 2020-12-22 16:43:28,281 Stage-1 map = 100%, reduce = 76%, Cumulative CPU 10897.09 sec 2020-12-22 16:43:46,763 Stage-1 map = 100%, reduce = 77%, Cumulative CPU 10946.29 sec 2020-12-22 16:44:06,238 Stage-1 map = 100%, reduce = 78%, Cumulative CPU 11008.59 sec 2020-12-22 16:44:31,881 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 11070.09 sec 2020-12-22 16:44:50,338 Stage-1 map = 100%, reduce = 80%, Cumulative CPU 11119.49 sec 2020-12-22 16:45:10,864 Stage-1 map = 100%, reduce = 81%, Cumulative CPU 11182.32 sec 2020-12-22 16:45:35,471 Stage-1 map = 100%, reduce = 82%, Cumulative CPU 11244.0 sec 2020-12-22 16:45:52,893 Stage-1 map = 100%, reduce = 83%, Cumulative CPU 11293.35 sec 2020-12-22 16:46:14,420 Stage-1 map = 100%, reduce = 84%, Cumulative CPU 11356.49 sec 2020-12-22 16:46:38,045 Stage-1 map = 100%, reduce = 85%, Cumulative CPU 11419.4 sec 2020-12-22 16:46:56,528 Stage-1 map = 100%, reduce = 86%, Cumulative CPU 11468.41 sec 2020-12-22 16:47:20,132 Stage-1 map = 100%, reduce = 87%, Cumulative CPU 11531.36 sec 2020-12-22 16:47:42,714 Stage-1 map = 100%, reduce = 88%, Cumulative CPU 11593.23 sec 2020-12-22 16:48:05,312 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 11654.95 sec 2020-12-22 16:48:23,791 Stage-1 map = 100%, reduce = 90%, Cumulative CPU 11704.69 sec 2020-12-22 16:48:46,388 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 11766.5 sec 2020-12-22 16:49:08,944 Stage-1 map = 100%, reduce = 92%, Cumulative CPU 11829.9 sec 2020-12-22 16:49:26,402 Stage-1 map = 100%, reduce = 93%, Cumulative CPU 11878.99 sec 2020-12-22 16:49:48,986 Stage-1 map = 100%, reduce = 94%, Cumulative CPU 11941.07 sec 2020-12-22 16:50:13,611 Stage-1 map = 100%, reduce = 95%, Cumulative CPU 12004.41 sec 2020-12-22 16:50:32,065 Stage-1 map = 100%, reduce = 96%, Cumulative CPU 12054.25 sec 2020-12-22 16:50:54,635 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 12116.61 sec 2020-12-22 16:51:18,221 Stage-1 map = 100%, reduce = 98%, Cumulative CPU 12180.35 sec 2020-12-22 16:51:35,661 Stage-1 map = 100%, reduce = 99%, Cumulative CPU 12230.74 sec 2020-12-22 16:52:07,473 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 12317.39 sec MapReduce Total cumulative CPU time: 0 days 3 hours 25 minutes 17 seconds 390 msec Ended Job = job_1606698967173_0354 Loading data to table test.test__ods_fact_sale_ods_fact_sale_idx02__ MapReduce Jobs Launched: Stage-Stage-1: Map: 117 Reduce: 469 Cumulative CPU: 12317.39 sec HDFS Read: 31439348928 HDFS Write: 73491333534 HDFS EC Read: 0 SUCCESS Total MapReduce CPU Time Spent: 0 days 3 hours 25 minutes 17 seconds 390 msec OK Time taken: 4709.654 seconds hive> SHOW FORMATTED INDEX ON ODS_FACT_SALE; OK idx_name tab_name col_names idx_tab_name idx_type comment idx_name tab_name col_names idx_tab_name idx_type comment ods_fact_sale_idx01 ods_fact_sale sale_date test__ods_fact_sale_ods_fact_sale_idx01__ compact ods_fact_sale_idx02 ods_fact_sale id test__ods_fact_sale_ods_fact_sale_idx02__ compact Time taken: 0.069 seconds, Fetched: 5 row(s) hive>

查看表和索引的大小 表的大小30G左右 索引的大小70G左右,id列创建索引,居然比表都大

[root@hp1 ~]# hadoop fs -ls /user/hive/warehouse/test.db/ods_fact_sale |awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}' 31421093662 [root@hp1 ~]# hadoop fs -ls /user/hive/warehouse/test.db/test__ods_fact_sale_ods_fact_sale_idx02__ |awk -F ' ' '{print $5}'|awk '{a+=$1}END{print a}' 73491280068 [root@hp1 ~]# 2.4 使用索引案例

使用索引需要调整如下参数:

-- 默认输入格式。如果遇到使用CombineHiveInputFormat的问题,将此设置为HiveInputFormat。 SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; -- 是否启用索引的自动使用 默认为false SET hive.optimize.index.filter=true; -- 自动使用压缩索引的输入的最小大小(以字节为单位) SET hive.optimize.index.filter.compact.minsize=0; -- 查询能使用的最大的压缩索引的空间直接数,默认为10737418240,为负值的时候代表无穷大 set hive.index.compact.query.max.size=-1;

测试记录: 如下的测试记录,ods_fact_sale表7亿左右数据,id为主键,在hive表中创建索引,索引居然比表占的存储都大。 而且单行的查询,用索引居然比直接查询更慢,当然更快的还是spark。 这也就是hive会在后面的版本中去掉索引,用物化视图的自动查询重写来代替。

-- 未使用索引 hive> > select * from ods_fact_sale where id = 10000; Query ID = root_20201222152519_ed85b5ad-000e-41dc-a85a-7e0bbc9e0176 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1606698967173_0353, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0353/ Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0353 Hadoop job information for Stage-1: number of mappers: 117; number of reducers: 0 2020-12-22 15:25:27,751 Stage-1 map = 0%, reduce = 0% 2020-12-22 15:25:39,162 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 15.37 sec 2020-12-22 15:25:46,394 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 22.64 sec 2020-12-22 15:25:53,616 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 37.16 sec 2020-12-22 15:25:54,665 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 44.45 sec 2020-12-22 15:26:00,842 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 51.79 sec 2020-12-22 15:26:01,870 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 59.02 sec 2020-12-22 15:26:09,059 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 73.6 sec 2020-12-22 15:26:17,276 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 88.21 sec 2020-12-22 15:26:25,487 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 102.7 sec 2020-12-22 15:26:33,694 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 117.33 sec 2020-12-22 15:26:39,838 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 124.44 sec 2020-12-22 15:26:47,012 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 138.83 sec 2020-12-22 15:26:49,061 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 146.02 sec 2020-12-22 15:26:54,177 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 153.39 sec 2020-12-22 15:26:56,224 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 160.72 sec 2020-12-22 15:27:01,364 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 167.89 sec 2020-12-22 15:27:04,444 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 175.21 sec 2020-12-22 15:27:12,640 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 189.69 sec 2020-12-22 15:27:16,749 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 196.86 sec 2020-12-22 15:27:19,825 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 204.02 sec 2020-12-22 15:27:22,893 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 211.12 sec 2020-12-22 15:27:25,969 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 218.38 sec 2020-12-22 15:27:35,189 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 232.69 sec 2020-12-22 15:27:38,248 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 239.79 sec 2020-12-22 15:27:42,338 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 246.99 sec 2020-12-22 15:27:45,409 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 254.03 sec 2020-12-22 15:27:50,531 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 261.5 sec 2020-12-22 15:27:52,579 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 268.68 sec 2020-12-22 15:27:59,745 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 283.25 sec 2020-12-22 15:28:06,918 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 290.47 sec 2020-12-22 15:28:07,944 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 297.58 sec 2020-12-22 15:28:14,091 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 304.7 sec 2020-12-22 15:28:15,116 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 312.05 sec 2020-12-22 15:28:21,254 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 319.32 sec 2020-12-22 15:28:28,412 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 333.58 sec 2020-12-22 15:28:30,451 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 340.7 sec 2020-12-22 15:28:35,566 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 347.92 sec 2020-12-22 15:28:37,615 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 355.18 sec 2020-12-22 15:28:42,760 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 362.32 sec 2020-12-22 15:28:45,837 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 369.46 sec 2020-12-22 15:28:53,000 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 383.79 sec 2020-12-22 15:28:57,099 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 390.96 sec 2020-12-22 15:29:00,171 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 398.23 sec 2020-12-22 15:29:03,239 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 405.41 sec 2020-12-22 15:29:07,333 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 412.59 sec 2020-12-22 15:29:09,381 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 418.3 sec 2020-12-22 15:29:17,571 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 433.16 sec 2020-12-22 15:29:21,670 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 440.77 sec 2020-12-22 15:29:24,739 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 448.0 sec 2020-12-22 15:29:29,852 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 455.16 sec 2020-12-22 15:29:32,923 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 462.21 sec 2020-12-22 15:29:37,015 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 469.44 sec 2020-12-22 15:29:45,208 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 484.18 sec 2020-12-22 15:29:46,232 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 491.29 sec 2020-12-22 15:29:53,402 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 505.98 sec 2020-12-22 15:30:00,561 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 520.64 sec 2020-12-22 15:30:08,750 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 535.06 sec 2020-12-22 15:30:16,921 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 549.64 sec 2020-12-22 15:30:25,110 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 564.1 sec 2020-12-22 15:30:33,297 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 578.87 sec 2020-12-22 15:30:40,468 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 586.16 sec 2020-12-22 15:30:41,492 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 593.42 sec 2020-12-22 15:30:48,656 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 607.97 sec 2020-12-22 15:30:56,844 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 622.63 sec 2020-12-22 15:31:05,035 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 637.24 sec 2020-12-22 15:31:11,176 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 644.51 sec 2020-12-22 15:31:13,226 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 651.73 sec 2020-12-22 15:31:18,340 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 658.77 sec 2020-12-22 15:31:21,408 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 666.09 sec 2020-12-22 15:31:29,616 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 680.5 sec 2020-12-22 15:31:31,665 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 687.82 sec 2020-12-22 15:31:36,784 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 694.88 sec 2020-12-22 15:31:39,855 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 701.86 sec 2020-12-22 15:31:44,974 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 709.08 sec 2020-12-22 15:31:47,020 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 716.29 sec 2020-12-22 15:31:55,214 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 730.6 sec 2020-12-22 15:32:00,337 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 737.95 sec 2020-12-22 15:32:02,385 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 745.08 sec 2020-12-22 15:32:08,514 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 752.31 sec 2020-12-22 15:32:09,538 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 759.48 sec 2020-12-22 15:32:15,677 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 766.75 sec 2020-12-22 15:32:22,843 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 781.36 sec 2020-12-22 15:32:23,866 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 788.45 sec 2020-12-22 15:32:30,009 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 795.75 sec 2020-12-22 15:32:32,057 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 802.85 sec 2020-12-22 15:32:38,211 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 810.04 sec 2020-12-22 15:32:40,254 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 817.29 sec 2020-12-22 15:32:47,426 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 831.76 sec 2020-12-22 15:32:52,545 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 838.89 sec 2020-12-22 15:32:55,612 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 846.16 sec MapReduce Total cumulative CPU time: 14 minutes 6 seconds 160 msec Ended Job = job_1606698967173_0353 MapReduce Jobs Launched: Stage-Stage-1: Map: 117 Cumulative CPU: 846.16 sec HDFS Read: 31436963448 HDFS Write: 10228 HDFS EC Read: 0 SUCCESS Total MapReduce CPU Time Spent: 14 minutes 6 seconds 160 msec OK ods_fact_sale.id ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums 10000 2012-03-11 00:00:00.0 PROD5 96 Time taken: 457.342 seconds, Fetched: 1 row(s) -- 使用索引 hive> > SET hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; hive> SET hive.optimize.index.filter=true; hive> SET hive.optimize.index.filter.compact.minsize=0; hive> set hive.index.compact.query.max.size=-1; hive> > > select * from ods_fact_sale where id = 10000; Query ID = root_20201222165727_6d0369ce-6498-4bed-8242-b52be9de77a8 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1606698967173_0355, Tracking URL = http://hp1:8088/proxy/application_1606698967173_0355/ Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job -kill job_1606698967173_0355 Hadoop job information for Stage-1: number of mappers: 237; number of reducers: 0 2020-12-22 16:57:34,782 Stage-1 map = 0%, reduce = 0% 2020-12-22 16:57:44,081 Stage-1 map = 1%, reduce = 0%, Cumulative CPU 11.95 sec 2020-12-22 16:57:51,278 Stage-1 map = 2%, reduce = 0%, Cumulative CPU 23.66 sec 2020-12-22 16:57:56,418 Stage-1 map = 3%, reduce = 0%, Cumulative CPU 35.37 sec 2020-12-22 16:58:07,711 Stage-1 map = 4%, reduce = 0%, Cumulative CPU 52.9 sec 2020-12-22 16:58:13,853 Stage-1 map = 5%, reduce = 0%, Cumulative CPU 64.76 sec 2020-12-22 16:58:22,039 Stage-1 map = 6%, reduce = 0%, Cumulative CPU 82.49 sec 2020-12-22 16:58:28,188 Stage-1 map = 7%, reduce = 0%, Cumulative CPU 94.33 sec 2020-12-22 16:58:34,335 Stage-1 map = 8%, reduce = 0%, Cumulative CPU 105.97 sec 2020-12-22 16:58:43,550 Stage-1 map = 9%, reduce = 0%, Cumulative CPU 123.51 sec 2020-12-22 16:58:49,692 Stage-1 map = 10%, reduce = 0%, Cumulative CPU 135.15 sec 2020-12-22 16:58:55,842 Stage-1 map = 11%, reduce = 0%, Cumulative CPU 147.0 sec 2020-12-22 16:59:04,028 Stage-1 map = 12%, reduce = 0%, Cumulative CPU 164.52 sec 2020-12-22 16:59:10,178 Stage-1 map = 13%, reduce = 0%, Cumulative CPU 176.03 sec 2020-12-22 16:59:16,310 Stage-1 map = 14%, reduce = 0%, Cumulative CPU 187.32 sec 2020-12-22 16:59:26,541 Stage-1 map = 15%, reduce = 0%, Cumulative CPU 204.7 sec 2020-12-22 16:59:32,680 Stage-1 map = 16%, reduce = 0%, Cumulative CPU 216.24 sec 2020-12-22 16:59:39,835 Stage-1 map = 17%, reduce = 0%, Cumulative CPU 233.71 sec 2020-12-22 16:59:45,971 Stage-1 map = 18%, reduce = 0%, Cumulative CPU 245.21 sec 2020-12-22 16:59:52,106 Stage-1 map = 19%, reduce = 0%, Cumulative CPU 256.37 sec 2020-12-22 17:00:03,363 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 274.03 sec 2020-12-22 17:00:09,513 Stage-1 map = 21%, reduce = 0%, Cumulative CPU 285.58 sec 2020-12-22 17:00:15,656 Stage-1 map = 22%, reduce = 0%, Cumulative CPU 303.05 sec 2020-12-22 17:00:21,796 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 314.47 sec 2020-12-22 17:00:27,932 Stage-1 map = 24%, reduce = 0%, Cumulative CPU 325.46 sec 2020-12-22 17:00:39,192 Stage-1 map = 25%, reduce = 0%, Cumulative CPU 342.82 sec 2020-12-22 17:00:45,334 Stage-1 map = 26%, reduce = 0%, Cumulative CPU 354.32 sec 2020-12-22 17:00:51,479 Stage-1 map = 27%, reduce = 0%, Cumulative CPU 365.88 sec 2020-12-22 17:00:58,635 Stage-1 map = 28%, reduce = 0%, Cumulative CPU 383.29 sec 2020-12-22 17:01:04,782 Stage-1 map = 29%, reduce = 0%, Cumulative CPU 395.04 sec 2020-12-22 17:01:09,911 Stage-1 map = 30%, reduce = 0%, Cumulative CPU 406.56 sec 2020-12-22 17:01:21,149 Stage-1 map = 31%, reduce = 0%, Cumulative CPU 423.83 sec 2020-12-22 17:01:27,298 Stage-1 map = 32%, reduce = 0%, Cumulative CPU 435.55 sec 2020-12-22 17:01:34,472 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 452.98 sec 2020-12-22 17:01:40,616 Stage-1 map = 34%, reduce = 0%, Cumulative CPU 464.47 sec 2020-12-22 17:01:46,756 Stage-1 map = 35%, reduce = 0%, Cumulative CPU 476.0 sec 2020-12-22 17:01:57,998 Stage-1 map = 36%, reduce = 0%, Cumulative CPU 499.32 sec 2020-12-22 17:02:03,113 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 505.22 sec 2020-12-22 17:02:09,262 Stage-1 map = 38%, reduce = 0%, Cumulative CPU 516.84 sec 2020-12-22 17:02:16,418 Stage-1 map = 39%, reduce = 0%, Cumulative CPU 534.09 sec 2020-12-22 17:02:22,550 Stage-1 map = 40%, reduce = 0%, Cumulative CPU 545.71 sec 2020-12-22 17:02:28,696 Stage-1 map = 41%, reduce = 0%, Cumulative CPU 557.23 sec 2020-12-22 17:02:38,940 Stage-1 map = 42%, reduce = 0%, Cumulative CPU 574.68 sec 2020-12-22 17:02:45,070 Stage-1 map = 43%, reduce = 0%, Cumulative CPU 586.31 sec 2020-12-22 17:02:52,230 Stage-1 map = 44%, reduce = 0%, Cumulative CPU 603.61 sec 2020-12-22 17:02:58,369 Stage-1 map = 45%, reduce = 0%, Cumulative CPU 615.2 sec 2020-12-22 17:03:04,502 Stage-1 map = 46%, reduce = 0%, Cumulative CPU 626.88 sec 2020-12-22 17:03:15,736 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 644.23 sec 2020-12-22 17:03:21,867 Stage-1 map = 48%, reduce = 0%, Cumulative CPU 655.32 sec 2020-12-22 17:03:28,012 Stage-1 map = 49%, reduce = 0%, Cumulative CPU 666.74 sec 2020-12-22 17:03:35,180 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 683.84 sec 2020-12-22 17:03:40,291 Stage-1 map = 51%, reduce = 0%, Cumulative CPU 695.25 sec 2020-12-22 17:03:51,544 Stage-1 map = 52%, reduce = 0%, Cumulative CPU 712.8 sec 2020-12-22 17:03:57,688 Stage-1 map = 53%, reduce = 0%, Cumulative CPU 724.58 sec 2020-12-22 17:04:03,803 Stage-1 map = 54%, reduce = 0%, Cumulative CPU 736.14 sec 2020-12-22 17:04:10,966 Stage-1 map = 55%, reduce = 0%, Cumulative CPU 753.53 sec 2020-12-22 17:04:17,094 Stage-1 map = 56%, reduce = 0%, Cumulative CPU 764.89 sec 2020-12-22 17:04:23,248 Stage-1 map = 57%, reduce = 0%, Cumulative CPU 776.53 sec 2020-12-22 17:04:33,482 Stage-1 map = 58%, reduce = 0%, Cumulative CPU 794.16 sec 2020-12-22 17:04:39,621 Stage-1 map = 59%, reduce = 0%, Cumulative CPU 805.71 sec 2020-12-22 17:04:46,787 Stage-1 map = 60%, reduce = 0%, Cumulative CPU 822.53 sec 2020-12-22 17:04:52,933 Stage-1 map = 61%, reduce = 0%, Cumulative CPU 834.34 sec 2020-12-22 17:04:59,072 Stage-1 map = 62%, reduce = 0%, Cumulative CPU 846.03 sec 2020-12-22 17:05:10,317 Stage-1 map = 63%, reduce = 0%, Cumulative CPU 863.38 sec 2020-12-22 17:05:15,432 Stage-1 map = 64%, reduce = 0%, Cumulative CPU 875.03 sec 2020-12-22 17:05:21,579 Stage-1 map = 65%, reduce = 0%, Cumulative CPU 886.8 sec 2020-12-22 17:05:28,732 Stage-1 map = 66%, reduce = 0%, Cumulative CPU 904.16 sec 2020-12-22 17:05:34,870 Stage-1 map = 67%, reduce = 0%, Cumulative CPU 915.76 sec 2020-12-22 17:05:41,000 Stage-1 map = 68%, reduce = 0%, Cumulative CPU 927.28 sec 2020-12-22 17:05:52,264 Stage-1 map = 69%, reduce = 0%, Cumulative CPU 944.79 sec 2020-12-22 17:05:58,401 Stage-1 map = 70%, reduce = 0%, Cumulative CPU 956.18 sec 2020-12-22 17:06:04,544 Stage-1 map = 71%, reduce = 0%, Cumulative CPU 973.61 sec 2020-12-22 17:06:10,688 Stage-1 map = 72%, reduce = 0%, Cumulative CPU 985.31 sec 2020-12-22 17:06:16,822 Stage-1 map = 73%, reduce = 0%, Cumulative CPU 996.76 sec 2020-12-22 17:06:28,072 Stage-1 map = 74%, reduce = 0%, Cumulative CPU 1014.18 sec 2020-12-22 17:06:34,210 Stage-1 map = 75%, reduce = 0%, Cumulative CPU 1025.82 sec 2020-12-22 17:06:40,339 Stage-1 map = 76%, reduce = 0%, Cumulative CPU 1037.42 sec 2020-12-22 17:06:47,496 Stage-1 map = 77%, reduce = 0%, Cumulative CPU 1054.79 sec 2020-12-22 17:06:52,614 Stage-1 map = 78%, reduce = 0%, Cumulative CPU 1066.27 sec 2020-12-22 17:07:03,879 Stage-1 map = 79%, reduce = 0%, Cumulative CPU 1083.42 sec 2020-12-22 17:07:10,021 Stage-1 map = 80%, reduce = 0%, Cumulative CPU 1095.07 sec 2020-12-22 17:07:16,162 Stage-1 map = 81%, reduce = 0%, Cumulative CPU 1106.66 sec 2020-12-22 17:07:23,329 Stage-1 map = 82%, reduce = 0%, Cumulative CPU 1123.82 sec 2020-12-22 17:07:29,475 Stage-1 map = 83%, reduce = 0%, Cumulative CPU 1135.42 sec 2020-12-22 17:07:35,621 Stage-1 map = 84%, reduce = 0%, Cumulative CPU 1147.24 sec 2020-12-22 17:07:45,851 Stage-1 map = 85%, reduce = 0%, Cumulative CPU 1164.55 sec 2020-12-22 17:07:51,997 Stage-1 map = 86%, reduce = 0%, Cumulative CPU 1175.9 sec 2020-12-22 17:07:59,167 Stage-1 map = 87%, reduce = 0%, Cumulative CPU 1193.34 sec 2020-12-22 17:08:05,300 Stage-1 map = 88%, reduce = 0%, Cumulative CPU 1204.96 sec 2020-12-22 17:08:11,440 Stage-1 map = 89%, reduce = 0%, Cumulative CPU 1216.65 sec 2020-12-22 17:08:22,708 Stage-1 map = 90%, reduce = 0%, Cumulative CPU 1234.08 sec 2020-12-22 17:08:27,837 Stage-1 map = 91%, reduce = 0%, Cumulative CPU 1245.61 sec 2020-12-22 17:08:33,979 Stage-1 map = 92%, reduce = 0%, Cumulative CPU 1257.17 sec 2020-12-22 17:08:41,147 Stage-1 map = 93%, reduce = 0%, Cumulative CPU 1274.05 sec 2020-12-22 17:08:47,299 Stage-1 map = 94%, reduce = 0%, Cumulative CPU 1285.58 sec 2020-12-22 17:08:53,447 Stage-1 map = 95%, reduce = 0%, Cumulative CPU 1297.34 sec 2020-12-22 17:09:04,704 Stage-1 map = 96%, reduce = 0%, Cumulative CPU 1314.97 sec 2020-12-22 17:09:09,823 Stage-1 map = 97%, reduce = 0%, Cumulative CPU 1326.53 sec 2020-12-22 17:09:16,989 Stage-1 map = 98%, reduce = 0%, Cumulative CPU 1343.78 sec 2020-12-22 17:09:23,135 Stage-1 map = 99%, reduce = 0%, Cumulative CPU 1354.25 sec 2020-12-22 17:09:31,324 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1366.84 sec MapReduce Total cumulative CPU time: 22 minutes 46 seconds 840 msec Ended Job = job_1606698967173_0355 MapReduce Jobs Launched: Stage-Stage-1: Map: 237 Cumulative CPU: 1366.84 sec HDFS Read: 31437557556 HDFS Write: 20668 HDFS EC Read: 0 SUCCESS Total MapReduce CPU Time Spent: 22 minutes 46 seconds 840 msec OK ods_fact_sale.id ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums 10000 2012-03-11 00:00:00.0 PROD5 96 Time taken: 725.125 seconds, Fetched: 1 row(s) hive> - -- 使用spark更快 -- 删除索引后spark依旧很快,证明spark未使用到索引 hive> > > set hive.execution.engine=spark; hive> select * from ods_fact_sale where id = 10000; Query ID = root_20201222171114_26a785b8-b6a9-4780-96d2-f868d89d0dc9 Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Running with YARN Application = application_1606698967173_0356 Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1606698967173_0356 Hive on Spark Session Web UI URL: http://hp4:36726 Query Hive on Spark job[0] stages: [0] Spark job[0] status = RUNNING -------------------------------------------------------------------------------------- STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED -------------------------------------------------------------------------------------- Stage-0 ........ 0 FINISHED 237 237 0 0 0 -------------------------------------------------------------------------------------- STAGES: 01/01 [==========================>>] 100% ELAPSED TIME: 61.24 s -------------------------------------------------------------------------------------- Spark job[0] finished successfully in 61.24 second(s) Spark Job[0] Metrics: TaskDurationTime: 439651, ExecutorCpuTime: 343803, JvmGCTime: 5379, BytesRead / RecordsRead: 31437553053 / 767830000, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0 OK ods_fact_sale.id ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums 10000 2012-03-11 00:00:00.0 PROD5 96 Time taken: 79.735 seconds, Fetched: 1 row(s) hive> > DROP INDEX ods_fact_sale_idx02 on ods_fact_sale; OK Time taken: 0.112 seconds hive> select * from ods_fact_sale where id = 10000; Query ID = root_20201222171254_29ea80f3-e924-439d-b423-8b55363382bb Total jobs = 1 Launching Job 1 out of 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer= In order to limit the maximum number of reducers: set hive.exec.reducers.max= In order to set a constant number of reducers: set mapreduce.job.reduces= Running with YARN Application = application_1606698967173_0356 Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/yarn application -kill application_1606698967173_0356 Hive on Spark Session Web UI URL: http://hp4:36726 Query Hive on Spark job[1] stages: [1] Spark job[1] status = RUNNING -------------------------------------------------------------------------------------- STAGES ATTEMPT STATUS TOTAL COMPLETED RUNNING PENDING FAILED -------------------------------------------------------------------------------------- Stage-1 ........ 0 FINISHED 237 237 0 0 0 -------------------------------------------------------------------------------------- STAGES: 01/01 [==========================>>] 100% ELAPSED TIME: 50.13 s -------------------------------------------------------------------------------------- Spark job[1] finished successfully in 50.13 second(s) Spark Job[1] Metrics: TaskDurationTime: 394418, ExecutorCpuTime: 336367, JvmGCTime: 3504, BytesRead / RecordsRead: 31437707340 / 767830000, BytesReadEC: 0, ShuffleTotalBytesRead / ShuffleRecordsRead: 0 / 0, ShuffleBytesWritten / ShuffleRecordsWritten: 0 / 0 OK ods_fact_sale.id ods_fact_sale.sale_date ods_fact_sale.prod_name ods_fact_sale.sale_nums 10000 2012-03-11 00:00:00.0 PROD5 96 Time taken: 50.279 seconds, Fetched: 1 row(s) hive>


【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

      专题文章
        CopyRight 2018-2019 实验室设备网 版权所有